NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Understanding the Gain from Data Filtering in Multimodal Contrastive Learning

Pareek, D; Oh, S; Du, S (December 2025, Conference on Neural Information Processing Systems (NeurIPS) 2025)

Full Text Available
SuperBPE: Space Travel for Language Models

Liu, A; Hayase, J; Hofmann, V; Oh, S; Smith, N A; Choi, Y (April 2025, https://doi.org/10.48550/arXiv.2503.13423)

The assumption across nearly all language model (LM) tokenization schemes is that tokens should be subwords, i.e., contained within word boundaries. While providing a seemingly reasonable inductive bias, is this common practice limiting the potential of modern LMs? Whitespace is not a reliable delimiter of meaning, as evidenced by multi-word expressions (e.g., "by the way"), crosslingual variation in the number of words needed to express a concept (e.g., "spacesuit helmet" in German is "raumanzughelm"), and languages that do not use whitespace at all (e.g., Chinese). To explore the potential of tokenization beyond subwords, we introduce a "superword" tokenizer, SuperBPE, which incorporates a simple pretokenization curriculum into the byte-pair encoding (BPE) algorithm to first learn subwords, then superwords that bridge whitespace. This brings dramatic improvements in encoding efficiency: when fixing the vocabulary size to 200k, SuperBPE encodes a fixed piece of text with up to 33% fewer tokens than BPE on average. In experiments, we pretrain 8B transformer LMs from scratch while fixing the model size, vocabulary size, and train compute, varying *only* the algorithm for learning the vocabulary. Our model trained with SuperBPE achieves an average +4.0% absolute improvement over the BPE baseline across 30 downstream tasks (including +8.2% on MMLU), while simultaneously requiring 27% less compute at inference time. In analysis, we find that SuperBPE results in segmentations of text that are more uniform in per-token difficulty. Qualitatively, this may be because SuperBPE tokens often capture common multi-word expressions that function semantically as a single unit. SuperBPE is a straightforward, local modification to tokenization that improves both encoding efficiency and downstream performance, yielding better language models overall.
more » « less
Full Text Available
Sharc: Simulator for Hardware Architecture and Real-time Control

Wintz, PK; Sonmez, Y; Griffioen, P; Xu, M; Oh, S; Litz, H; Sanfelice, R; Arcak, M (May 2025, Proceedings of the Conference on Hybrid Systems: Computation and Control (2025))

Full Text Available
S4S: Solving for a Diffusion Model Solver

Frankel, E; Chen, S; Li, J; Koh, P W; Ratliff, L J; Oh, S (February 2025, https://doi.org/10.48550/arXiv.2502.17423)

Diffusion models (DMs) create samples from a data distribution by starting from random noise and iteratively solving a reverse-time ordinary differential equation (ODE). Because each step in the iterative solution requires an expensive neural function evaluation (NFE), there has been significant interest in approximately solving these diffusion ODEs with only a few NFEs without modifying the underlying model. However, in the few NFE regime, we observe that tracking the true ODE evolution is fundamentally impossible using traditional ODE solvers. In this work, we propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S). S4S directly optimizes a solver to obtain good generation quality by learning to match the output of a strong teacher solver. We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. By exploiting the full design space of DM solvers, with 5 NFEs, we achieve an FID of 3.73 on CIFAR10 and 13.26 on MS-COCO, representing a 1.5× improvement over previous training-free ODE methods.
more » « less
Full Text Available
Effects of cloud geometry and metallicity on shattering and coagulation of cold gas, and implications for cold streams penetrating virial shocks

https://doi.org/10.1093/mnras/stae2771

Yao, Zhiyuan; Mandelker, Nir; Oh, S Peng; Aung, Han; Dekel, Avishai (December 2024, Monthly Notices of the Royal Astronomical Society)

ABSTRACT Theory and observations reveal that the circumgalactic medium (CGM) and the cosmic web at high redshifts are multiphase, with small clouds of cold gas embedded in a hot, diffuse medium. We study the ‘shattering’ of large, thermally unstable clouds into tiny cloudlets of size $$\ell _{\rm shatter}\sim {\rm min}(c_{\rm s}t_{\rm cool})$$ using idealized numerical simulations. We expand upon previous works by exploring the effects of cloud geometry (spheres, streams, and sheets), metallicity, and an ionizing ultraviolet background. We find that ‘shattering’ is mainly triggered by clouds losing sonic contact and rapidly imploding, leading to a reflected shock that causes the cloud to re-expand and induces Richtmyer–Meshkov instabilities at its interface. The fragmented cloudlets experience a drag force from the surrounding hot gas, leading to recoagulation into larger clouds. We distinguish between ‘fast’ and ‘slow’ coagulation regimes. Sheets are always in the ‘fast’ coagulation regime, while streams and spheres transition to ‘slow’ coagulation above a critical overdensity, which is smallest for spheres. Surprisingly, $$\ell _\mathrm{shatter}$$ does not appear to be a characteristic clump size even if it is well resolved. Rather, fragmentation continues until the grid scale with a mass distribution of $$N(\gt m)\propto m^{-1}$$. We apply our results to cold streams feeding massive ($$M_{\rm v}\lower.5ex\rm{\,\, \buildrel\gt \over \sim \,\,}10^{12}\, {\rm M}_\odot$$) galaxies at $$z\lower.5ex\rm{\,\, \buildrel\gt \over \sim \,\,}2$$ from the cosmic web, finding that streams likely shatter upon entering the hot CGM through the virial shock. This could explain the large clumping factors and covering fractions of cold gas around such galaxies, and may be related to galaxy quenching by preventing cold streams from reaching the central galaxy.
more » « less
Full Text Available
Cosmic-Ray Drag and Damping of Compressive Turbulence

https://doi.org/10.3847/1538-4357/aceef9

Bustard, Chad; Oh, S. Peng (September 2023, The Astrophysical Journal)

Abstract While it is well known that cosmic rays (CRs) can gain energy from turbulence via second-order Fermi acceleration, how this energy transfer affects the turbulent cascade remains largely unexplored. Here, we show that damping and steepening of the compressive turbulent power spectrum are expected once the damping time t damp ∼ ρ v 2 / E ̇ CR ∝ E CR − 1 becomes comparable to the turbulent cascade time. Magnetohydrodynamic simulations of stirred compressive turbulence in a gas-CR fluid with diffusive CR transport show clear imprints of CR-induced damping, saturating at E ̇ CR ∼ ϵ ˜ , where ϵ ˜ is the turbulent energy input rate. In that case, almost all of the energy in large-scale motions is absorbed by CRs and does not cascade down to grid scale. Through a Hodge–Helmholtz decomposition, we confirm that purely compressive forcing can generate significant solenoidal motions, and we find preferential CR damping of the compressive component in simulations with diffusion and streaming, rendering small-scale turbulence largely solenoidal, with implications for thermal instability and proposed resonant scattering of E ≳ 300 GeV CRs by fast modes. When CR transport is streaming dominated, CRs also damp large-scale motions, with kinetic energy reduced by up to 1 order of magnitude in realistic E CR ∼ E g scenarios, but turbulence (with a reduced amplitude) still cascades down to small scales with the same power spectrum. Such large-scale damping implies that turbulent velocities obtained from the observed velocity dispersion may significantly underestimate turbulent forcing rates, i.e., ϵ ˜ ≫ ρ v 3 / L .
more » « less
Full Text Available
The impact of Cosmic Rays on thermal and hydrostatic stability in galactic halos

https://doi.org/10.1093/mnras/stad2720

Tsung, Tsun Hin; Oh, S Peng; Bustard, Chad (September 2023, Monthly Notices of the Royal Astronomical Society)

Abstract We investigate how cosmic rays (CRs) affect thermal and hydrostatic stability of circumgalactic (CGM) gas, in simulations with both CR streaming and diffusion. Local thermal instability can be suppressed by CR-driven entropy mode propagation, in accordance with previous analytic work. However, there is only a narrow parameter regime where this operates, before CRs overheat the background gas. As mass dropout from thermal instability causes the background density and hence plasma β ≡ Pg/PB to fall, the CGM becomes globally unstable. At the cool disk to hot halo interface, a sharp drop in density boosts Alfven speeds and CR gradients, driving a transition from diffusive to streaming transport. CR forces and heating strengthen, while countervailing gravitational forces and radiative cooling weaken, resulting in a loss of both hydrostatic and thermal equilibrium. In lower β halos, CR heating drives a hot, single-phase diffuse wind with velocities v∝(theat/tff)−1, which exceeds the escape velocity when theat/tff ≲ 0.4. In higher β halos, where the Alfven Mach number is higher, CR forces drive multi-phase winds with cool, dense fountain flows and significant turbulence. These flows are CR dominated due to ‘trapping’ of CRs by weak transverse B-fields, and have the highest mass loading factors. Thus, local thermal instability can result in winds or fountain flows where either the heat or momentum input of CRs dominates.
more » « less
Full Text Available
Key Physical Processes in the Circumgalactic Medium

https://doi.org/10.1146/annurev-astro-052920-125203

Faucher-Giguère, Claude-André; Oh, S. Peng (August 2023, Annual Review of Astronomy and Astrophysics)

Spurred by rich, multiwavelength observations and enabled by new simulations, ranging from cosmological to subparsec scales, the past decade has seen major theoretical progress in our understanding of the circumgalactic medium (CGM). We review key physical processes in the CGM. Our conclusions include the following: ▪ The properties of the CGM depend on a competition between gravity-driven infall and gas cooling. When cooling is slow relative to free fall, the gas is hot (roughly virial temperature), whereas the gas is cold ( T ∼ 10⁴K) when cooling is rapid. ▪ Gas inflows and outflows play crucial roles, as does the cosmological environment. Large-scale structure collimates cold streams and provides angular momentum. Satellite galaxies contribute to the CGM through winds and gas stripping. ▪ In multiphase gas, the hot and cold phases continuously exchange mass, energy, and momentum. The interaction between turbulent mixing and radiative cooling is critical. A broad spectrum of cold gas structures, going down to subparsec scales, arises from fragmentation, coagulation, and condensation onto gas clouds. ▪ Magnetic fields, thermal conduction, and cosmic rays can substantially modify how the cold and hot phases interact, although microphysical uncertainties are presently large. Key open questions for future work include the mutual interplay between small-scale structure and large-scale dynamics, and how the CGM affects the evolution of galaxies.
more » « less
Full Text Available
Cooling-driven coagulation

https://doi.org/10.1093/mnras/stad1874

Gronke, Max; Oh, S Peng (July 2023, Monthly Notices of the Royal Astronomical Society)

ABSTRACT Astrophysical gases such as the interstellar-, circumgalactic-, or intracluster-medium are commonly multiphase, which poses the question of the structure of these systems. While there are many known processes leading to fragmentation of cold gas embedded in a (turbulent) hot medium, in this work, we focus on the reverse process: coagulation. This is often seen in wind-tunnel and shearing layer simulations, where cold gas fragments spontaneously coalesce. Using 2D and 3D hydrodynamical simulations, we find that sufficiently large (≫cstcool), perturbed cold gas clouds develop pulsations which ensure cold gas mass growth over an extended period of time (≫r/cs). This mass growth efficiently accelerates hot gas which in turn can entrain cold droplets, leading to coagulation. The attractive inverse square force between cold gas droplets has interesting parallels with gravity; the ‘monopole’ is surface area rather than mass. We develop a simple analytic model which reproduces our numerical findings.
more » « less
Full Text Available
CRISP: curriculum based sequential neural decoders for polar code family

Hebbar, A.; Vardhan, A.; Nadkani, V.; Bhat, S.; Oh, S.; Viswanath, P. (July 2023, Proceedings of the 40th International Conference on Machine Learning)

Full Text Available

« Prev Next »

Search for: All records